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Abstract 

In many applications, nodes in a network desire not only a consensus, but an optimal one. To 
date, a family of subgradient algorithms have been proposed to solve this problem under general 
convexity assumptions. This paper shows that, for the scalar case and by assuming a bit more, 
novel non-gradient-based algorithms with appealing features can be constructed. Specifically, 
we develop Pairwise Equalizing (PE) and Pairwise Bisectioning (PB), two gossip algorithms 
that solve unconstrained, separable, convex consensus optimization problems over undirected 
networks with time- varying topologies, where each local function is strictly convex, continuously 
diffcrentiable, and has a minimizer. We show that PE and PB are easy to implement, bypass 
limitations of the subgradient algorithms, and produce switched, nonlinear, networked dynami- 
cal systems that admit a common Lyapunov function and asymptotically converge. Moreover, 
PE generalizes the well-known Pairwise Averaging and Randomized Gossip Algorithm, while 
PB relaxes a requirement of PE, allowing nodes to never share their local functions. 

1 Introduction 

Consider an A^-node multi-hop network, where each node i observes a convex function /j, and 
all the N nodes wish to determine an optimal consensus x*, which minimizes the sum of the /j's: 

N 



x* € argmin^^ (1) 



x i 
i=i 
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Since each node i knows only its own /j, the nodes cannot individually compute the optimal 
consensus x* and, thus, must collaborate to do so. This problem of achieving unconstrained, 
separable, convex consensus optimization has many applications in multi-agent systems and wired/ 
wireless/social networks, some examples of which can be found in [H[2]. 

The current literature offers a large body of work on distributed consensus (see [3] for a survey) , 
including a line of research that focuses on solving problem ([I]) for an optimal consensus x* [TJEJI1]- 
117] . This line of work has resulted in a family of discrete-time subgradient algorithms, including the 
incremental subgradient algorithms [Hl2l3H8l [T0lfT5] , whereby an estimate of x* is passed around the 
network, and the non-incremental ones [9] lllfH%] ll6 ] ll7]. whereby each node maintains an estimate 
of x* and updates it iteratively by exchanging information with neighbors. 

Although the aforementioned subgradient algorithms are capable of solving problem (pQ) under 
fairly weak assumptions, they suffer from one or more of the following limitations: 

LI. Stepsizes: The algorithms require selection of stepsizes, which may be constant, diminishing, 
or dynamic. In general, constant stepsizes ensure only convergence to neighborhoods of x*, 
rather than to x* itself. Moreover, they present an inevitable trade-off: larger stepsizes tend to 
yield larger convergence neighborhoods, while smaller ones tend to yield slower convergence. 
In contrast, diminishing stepsizes typically ensure asymptotic convergence. However, the 
convergence may be very slow, since the stepsizes may diminish too quickly. Finally, dynamic 
stepsizes allow shaping of the convergence behavior [U(6]. Unfortunately, their dynamics 
depend on global information that is often costly to obtain. Hence, selecting appropriate 
stepsizes is not a trivial task, and inappropriate choices can cause poor performance. 

L2. Hamiltonian cycle: Many incremental subgradient algorithms [rj[2"ll3HTlll0tll5] require the 
nodes to construct and maintain a Hamiltonian cycle (i.e., a closed path that visits every 
node exactly once) or a pseudo one (i.e., that allows multiple visits), which may be very 
difficult to carry out, especially in a decentralized, leader less fashion. 

L3. Multi-hop transmissions: Some incremental subgradient algorithms [3HS] require the node 
that has the latest estimate of x* to pass it on to a randomly and equiprobably chosen node 
in the network. This implies that every node must be aware of all the nodes in the network, 
and the algorithms must run alongside a routing protocol that enables such passing, which 
may not always be the case. The fact that the chosen node is typically multiple hops away also 
implies that these algorithms are communication inefficient, requiring plenty of transmissions 
(up to the network diameter) just to complete a single iteration. 

L4. Lack of asymptotic convergence: A variety of convergence properties have been established for 
the subgradient algorithms in (TpP HTTj. including error bounds, convergence in expectations, 
convergence in limit inferiors, convergence rates, etc. In contrast, relatively few asymptotic 
convergence results have been reported, except for the subgradient algorithms with diminish- 
ing or dynamic stepsizes in [3H5|[TU |[TCHTT] . 
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Limitations L1-L4 facing the subgradient algorithms raise the question of whether it is possible 
to devise algorithms, which require neither the notion of a stepsize, the construction of a (pseudo- 
)Hamiltonian cycle, nor the use of a routing protocol for multi-hop transmissions, and yet guarantee 
asymptotic convergence, bypassing L1-L4. In this paper, we show that, for the one- dimensional 
case and with a few mild assumptions, such algorithms can be constructed. Specifically, instead 
of letting the network be directed, we assume that it is undirected, with possibly a time-varying 
topology unknown to any of the nodes. In addition, instead of letting each /j in ([1]) be convex 
but not necessarily differentiable, we assume that it is strictly convex, continuously differentiable, 
and has a minimizer. Based on these assumptions, we develop two gossip-style, distributed asyn- 
chronous iterative algorithms, referred to as Pairwise Equalizing (PE) and Pairwise Bisectioning 
(PB), which not only solve problem (pQ) and circumvent limitations L1-L4, but also are rather easy to 
implement — although computationally they are more demanding than the subgradient algorithms. 

As will be shown in the paper, PE and PB exhibit a number of notable features. First, they 
produce switched, nonlinear, networked dynamical systems whose state evolves along an invari- 
ant manifold whenever nodes gossip with each other. The switched systems are proved, using 
Lyapunov stability theory, to be asymptotically convergent, as long as the gossiping pattern is 
sufficiently rich. In particular, we show that the first-order convexity condition can be used to 
form a common Lyapunov function, as well as to characterize drops in its value after every gossip. 
Second, PE and PB do not belong to the family of subgradient algorithms as they utilize funda- 
mentally different, non-gradient-based update rules that involve no stepsize. These update rules 
are synthesized from two simple ideas — conservation and dissipation — which are somewhat similar 
to how Pairwise Averaging [T2] was conceived back in the 1980s. Indeed, we show that PE reduces 
to Pairwise Averaging [18] and Randomized Gossip Algorithm [19] when problem (pQ) specializes to 
an averaging problem. Finally, PE requires one-time sharing of the /j's between gossiping nodes, 
which may be costly or impermissible in some applications. This requirement is eliminated by PB 
at the expense of more communications per iteration. 

2 Problem Formulation 

Consider a multi-hop network consisting of N > 2 nodes, connected by bidirectional links 
in a time-varying topology. The network is modeled as an undirected graph G(k) = (V,£(k)), 
where k G N = {0, 1,2,...} denotes time, V = {1, 2, . . . , A'"} represents the set of N nodes, and 
£{k) C {{i,j} ■ i,j £ V,i 7^ j} represents the nonempty set of links at time k. Any two nodes 
i, j € V are one-hop neighbors and can communicate at time k £ N if and only if {i,j} € £ (k). 

Suppose, at time k = 0, each node i € V observes a function /j : X — > M, which maps a 
nonempty open interval X C K to R, and which satisfies the following assumption: 

Assumption 1. For each i 6 V, the function /j is strictly convex, continuously differentiable, and 
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has a minimizer G X. 



Suppose, upon observing the /j's, all the ./V nodes wish to solve the following unconstrained, 
separable, convex optimization problem: 



where the function F : X — > M. is defined as F(x) = X^ieV fi( x )- Clearly, F is strictly convex and 
continuously differentiable. To show that F has a unique minimizer in X so that problem (j2J) is 
well-posed, let f! : X — > M and F' : X — > R denote the derivatives of /j and F, respectively, and 
consider the following lemma and proposition: 

Lemma 1. Let gi : X M be a strictly increasing and continuous function and z% G X for 
i = 1,2,... ,n. Then, there exists a unique z G X such that J27=i9i( z ) = Y^A=x9i{ z i)- Moreover, 
z G [min ie {i i 2 ( ... ) n}^ J max ie {i ) 2 ) ... )n }«i]. 

Proof. Since gi is strictly increasing and continuous Vi G {l,2,...,n}, so is X^=i 5* : ^ — > K. 
Thus, E™=iS'i(min ie {i,2,...,n}^) < EiUft(^) ^ Th=i ffi( max je{l,2,...,n} Xt follows from the 
Intermediate Value Theorem that there exists a unique z £ X such that X^r=i5«( z ) = SILi 9i{ z i)-> 
and that z G [nairii e { li2 ,..., n } ^, maxj g { li2 „} z,j\. □ 

Proposition 1. With Assumption^ there exists a unique x* G X, which satisfies F'(x*) = 0, 
minimizes F over X, and solves problem ([2]), i.e., x* = arg min^^ F(x). 

Proof. By Assumption HJ for every i G V, is strictly increasing and continuous. By Lemma dj 
there exists a unique x* G such that ]Ciev fl( x *) = Siev/i( x D- Since F' = X^gv /i an( ^ 
f[{x*) = Vi G V, F'(x*) = 0. Since F is strictly convex, x* minimizes i 7 over , solving (J2j) . □ 

Given the above, the goal is to construct a distributed asynchronous iterative algorithm free of 
limitations L1-L4, with which each node can asymptotically determine the unknown optimizer x* . 

3 Pairwise Equalizing 

In this section, we develop a gossip algorithm having the aforementioned features. 

Suppose, at time k = 0, each node i G V creates a state variable x% G X in its local memory, 
which represents its estimate of x*. Also suppose, at each subsequent time k G P = {1,2, . . .}, an 
iteration, called iteration k, takes place. Let Xj(0) represent the initial value of Xi, and Xi(k) its 
value upon completing each iteration fcgP. With this setup, the goal may be stated as 



min F(x), 



(2) 



lim Xi(k) = x* , Vi G V. 



(3) 
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To design an algorithm that guarantees (J3]), consider a conservation condition 

J2fl(xi(k))=0, VfceN, (4) 

which says that the £j(/c)'s evolve in a way that the sum of the derivatives /''s, evaluated at the 
Xi(k)'s, is always conserved at zero. Moreover, consider a dissipation condition 

lim Xi(k) = x, Vi G V, for some x G A 7 , (5) 

which says that the Xi(k)'s gradually dissipate their differences and asymptotically achieve some 
arbitrary consensus x G X. Note that if (|3J) is met, then limj._j.oo J^iev fl( x i(k)) = li m fc-s>oo = 
0. If, in addition, (0) is met, then due to the continuity of every f-, ^_iev um A:->oo fl(%i(k)) = 
J2iev fi( lim k->oo Xi(k)) = J2ievfi(%) = F '(x). Because lim^oo exists for every i G V, 

linifc^oo J_ ieV f-(xi(k)) = ^ ieV limfc-Kx, f-(xi(k)). Combining the above, we obtain F'(x) = 0. 
From Proposition [H we see that the arbitrary consensus x must be the unknown optimizer x* , i.e., 
x = x* , so that ([3|) holds. Therefore, to design an algorithm that ensures ([3|) — where x* explicitly 
appears, it suffices to make the algorithm satisfy both the conservation and dissipation conditions 
(JU) and © — where x* is implicitly encoded. 

To this end, observe that (|3|) holds if and only if the Xj(0)'s are such that YlieV /i'O^'(O)) = 0> 
and the Xj(fc)'s are related to the Xi(k — l)'s through 

= ^//(£#-l)), Vk€F. (6) 

To satisfy X^ieV fl( x i(®)) = ®i ^ suffices that each node i 6 V computes x* on its own and sets 

Xi (0) = x*, Vi G V, (7) 

since fl(x*) = 0. To satisfy ©, consider a gossip algorithm, whereby at each iteration k G P, a 
pair = {«i(fc), «2(fc)} G £(/c) of one-hop neighbors Ui(k) and U2(fc) gossip and update their 
x ui(k)(k) and x U2 ^(k), while the rest of the N nodes stay idle, i.e., 

Xi(k) = Xi (k-1), \/k G P, Vi G V-u(k). (8) 

With ([8|), equation ([6]) simplifies to 

+ 4(k)(<W)(*0) = 4w(^i«( & - !)) + 4 w (^ 2 «( fc - !))» e P. (9) 

Hence, all that is needed for ([U]) to hold is a gossip between nodes u\{k) and U2(k) to share their 
Ati(fe)) fu 2 (k), x Ul (k){k - 1), and/or x u ,^ k) {k - 1), followed by a joint update of their z Ul (*.)(&) and 
x U2 ( k j(k), which ensures ©. 

Obviously, Q alone does not uniquely determine x Ul ^(k) and x U2 ^(k). This suggests that 
the available degree of freedom may be used to account for the dissipation condition ([5]). Unlike 
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the conservation condition ©, however, ([5]) is about where the Xi(k)'s should approach as k — > oo, 
which nodes u\(k) and U2(k) cannot guarantee themselves since they are only responsible for two 
of the N Xj(/c)'s. Nevertheless, given that all the N Xj(/c)'s should approach the same limit, nodes 
ui(k) and 112(h) can help make this happen by imposing an equalizing condition 

x Ul (k){k) = x U2{k) (k), \/k G P. (10) 

With (fT0|) added, there are now two equations with two variables, providing nodes U\{k) and 1*2 (&) 
a chance to uniquely determine x Ul ^(k) and x U2 ^ (k) from © and (HO]). 

The following proposition asserts that ([9]) and (|10p always have a unique solution, so that the 
evolution of the Xi(k)'s is well-defined: 

Proposition 2. With Assumption^ and ([7|)- (ll0p . Xi(k) Mk G N Mi G V are well-defined, i.e., 
unambiguous and in X. Moreover, [minxj(fe), maxxj(fc)] C [minxj(/c — l),max£j(A; — 1)] V/c G P. 

Proof. By induction on A; G N. By Assumption Q] and ([7]), Xj(0) Vi G V are unambiguous and in X. 
Next, let k G P and suppose Xi(k — 1) Mi G V are unambiguous and in We show that so are Xi(k) 
Mi G V. From ©, Xi(k) Mi G V — are unambiguous and in X. To show that so are x Ul ^(k) 
and £ U2 (fc)(fc), we show that @ and (fTUj) have a unique solution (x Ul ^(k),x U2 ^(k)) G <Y 2 . By 
Lemma [H there is a unique z G X such that 

&(*)(*) + /«(*)(*) = - !)) + /«(fc)(^(fc)(fc - 1)), (11) 

which satisfies z G [min ieu ( fc) x { (k - 1), max ign(fc) - 1)]. Setting £ Wl (*)(&) = x U2 ^(k) = z, 
we see that (x Ul r k \(k), x U2 r k -\(k)) is a solution to © and (fTUj) . confirming the existence. Now let 
(ai,a 2 ) G ;f 2 and (61,62) G X 2 be two solutions of © and (HOj). Then, due to JTDJ), ©, and 
Lemma [TJ we have ai = a 2 = 61 = b%, confirming the uniqueness. Therefore, Xi(k) Mi £ V are 
well-defined as desired. Finally, the second statement follows from (jHJ) and the fact that x ui r k \(k) = 
x U2{k) {k) G [rmn ieu ( fe ) x { (k - l),max ieu ( fe ) Xi(k - 1)] VTc G P. □ 

Proposition [2] calls for a few remarks. First, the interval [minjgy Xi(k), maxj G y Xi(k)] can only 
shrink or remain unchanged over time k. While this does not guarantee the dissipation condition 
(J5J), it shows that the Xi(k)'s are "trying" to converge and are, at the very least, bounded even if X 
is not. Second, the proofs of Proposition [2] and Lemma [T] suggest a simple, practical procedure for 
nodes u\(k) and U2{k) to solve © and (fTUj) for (x Ul f k )(k),x U2 f k \(k)): apply a numerical root-finding 
method, such as the bisection method with initial bracket [minj gu (fc) Xi(k — 1), maxj Sn (fc) Xi(k — 1)], 
to solve (fTTj) for the unique z and then set x Ul ^(k) = £ U2 ^(k) = z. Finally, since (fTT|) always has 
a unique solution z, we can eliminate z and write 

£«i(k)(fc) = x U2(k) {k) = {f' ui[k) + /^ 2(fc) ) _1 (/^ l(fe) (5ux(Jfc)(^ " 1)) + f'u 2 {k)(Zu2{k){k ~ 1))), V/c G P, 

(12) 
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where (ft + /'•) 1 : (// + fj)(X) — > X denotes the inverse of the injective function /• + f- with its 



codomain restricted to its range. 

Expressions 0, (jSJ), and (fT2|) collectively define a gossip-style, distributed asynchronous itera- 
tive algorithm that yields a switched, nonlinear, networked dynamical system 



with initial condition and with (u(k))^ =l representing the sequence of gossiping nodes that 
trigger the switchings. As this algorithm ensures the conservation condition @, the state trajectory 
{xi (k) , x% (k) , . . . ,XN(k)) must remain on an (iV — l)-dimensional manifold M. = {(xi,X2, ■ ■ ■ , x/v) G 
X : X^igv fK Xi ) = 0} C X C K , making A4 an invariant set. Given that the algorithm involves 
repeated, pairwise equalizing of the Xj(/c)'s, we refer to it as Pairwise Equalizing (PE). PE may be 
expressed in a compact algorithmic form as follows: 

Algorithm 1 (Pairwise Equalizing). 

Initialization: 

1. Each node i G V computes x* G X, creates a variable Xj G X, and sets xi <(— x*. 
Operation: At each iteration: 

2. A node with one or more one-hop neighbors, say, node i, initiates the iteration and selects a 
one- hop neighbor, say, node j, to gossip. Nodes i and j select one of two ways to gossip by 
labeling themselves as either nodes a and 6, or nodes b and a, respectively, where {a, 6} = 
{i, j}. If node b does not know / a , node a transmits f a to node b. Node a transmits x a to 
node b. Node b sets ^— (f' a + /b)~ 1 (/a(^a) + fii&b)) an d transmits to node a. Node a 
sets x a <— Xf,. ■ 

Due to space limitations, we omit remarks concerning the execution of Algorithm Q] and refer 
the reader to an earlier, conference version of this paper |20j. 

Notice that PE does not rely on a stepsize parameter to execute, nor does it require the con- 
struction of a (pseudo-)Hamiltonian cycle, as well as the concurrent use of a routing protocol for 
multi-hop transmissions. Indeed, all it essentially needs is that every node is capable of applying 
a root-finding method, maintaining a list of its one-hop neighbors, and remembering the functions 
it learns along the way. Therefore, PE overcomes limitations L1-L3, while being rather easy to 
implement — although computationally it is more demanding than the subgradient algorithms. 

To show that PE asymptotically converges and, thus, circumvents L4, let x* = (x*,x*, . . . ,x*) 
and x(/c) = (xi(fc), X2(k), . . . ,XN(k)). Then, from Propositions [Q and [2 x* € X N and x(fc) G X N 
Vk G N. In addition, due to (Q2|), if x(k) = x* for some k G N, then x(£) = x* V£ > k. Hence, x* is 
an equilibrium point of the system (|T5|) . To show that lim^oo x(fc) = x*, i.e., @ holds, we seek to 
construct a Lyapunov function. To this end, recall that for any strictly convex and differentiable 




(13) 
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function / : X — > R, the first-order convexity condition says that 

f(y)>f(x) + f(x)(y-x), Vx,yeX, (14) 

where the equality holds if and only if x = y. This suggests the following Lyapunov function 
candidate V : X C M. N — > R, which exploits the convexity of the /j's: 

V( X (k)) = /<0O ~ fifrW) - fl^i(k))(x* - Xl (k)). (15) 
iev 

Notice that V in (fT5j) is well-defined. Moreover, due to Assumption [1] and (JHJ), V is continuous 
and positive definite with respect to x*, i.e., V(x(fc)) > Vx(&) € Af , where the equality holds if 
and only if x(/c) = x*. Therefore, to prove ([3|), it suffices to show that 

lim V(x(k)) = 0. (16) 

k— >oo 

The following lemma represents the first step toward establishing (jT6j) : 

Lemma 2. Consider the use of PE described in Algorithm^ Suppose Assumption^ holds. Then, 
for any given {u{k))'^' =l , (V (x(k)))^ =Q is non-increasing and satisfies 

K(x(fc)) - V(x(k - 1)) = - ~ M*i( k ~ !)) " f'i^ k ~ !))&(*) " Hk ~ 1)), 

\/k G P. (17) 

Proof. Let (■"(fc^fcLi be given. Then, from (fT5|) and (fl~3j) . we have F(x(fc)) — V(x(fc — 1)) = 
- fi{Hk ~ 1)) + fi(xi(k))x* - f{(xi(k - l))x* - f-(xi(k))xi(k) + f^(k - 
l))xi(k - 1) Vfc G P. Due to (USD, -E* 6u(fc) //(xi(fc))x* cancels £ i6u(fc) - 1))**, while 

Sie«(fc) fl( x i(k)) x i(k) becomes ^6«(fc) fl( x i( k ~ l))xi(k). This proves (HI]). Note that the right- 
hand side of (fTTJ) is nonpositive due to (JHJ). Hence, (V"(x(fc)))j^-g is non-increasing. □ 

Lemma[2]has several implications. First, upon completing each iteration k G P by any two nodes 
u\(k) and U2{k), the value of V must either decrease or, at worst, stay the same, where the latter 
occurs if and only if x Ul r k \(k — 1) = x U2 ( k \(k — 1). Second, since (V(x(A;)))^.q is non-increasing irre- 
spective of (u(k)) k x L 1 , V in (I15p may be regarded as a common Lyapunov function for the nonlinear 
switched system (fl~3|) . which has as many as JV ( J ^~ 1 ) different dynamics, corresponding to the JV ^~ 1 ^ 
possible gossiping pairs. Finally, the first-order convexity condition (|14p can be used not only to 
form the common Lyapunov function V, but also to characterize drops in its value in (|17p after every 
gossip. This is akin to how quadratic functions may be used to form a common Lyapunov function 
V(k) = x T (k)Px(k) for a linear switched system x(k + l) = A(k)x(k), A(k) G {Ai, A%, . . . , Am}, as 
well as to characterize drops in V(k) via V(k+1)—V(k) = x T (k)(Af PAi-P)x(k) = —x T (k)Qix(k). 
Indeed, as we will show later, when problem ([2]) specializes to an averaging problem, where the 
nonlinear switched system (|13p becomes linear, both V and its drop become quadratic functions. 
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As (V(x(fe)))^_Q is nonnegative and non-increasing, lim^oo V(x(k)) exists and is nonnegative. 
This, however, is insufficient for us to conclude that lim^oo V(x(A;)) = 0, since, for some patho- 
logical gossiping patterns, lim^oo V(x(fc)) can be positive (see [20J for examples). Thus, some 
restrictions must be imposed on the gossiping pattern, in order to establish (|16p . To this end, let 
£00 = {{hj} '■ u(k) = {i,j} for infinitely many k € P}, so that a link {i,j} is in £"00 if and only if 
nodes i and j gossip with each other infinitely often. Then, we may state the following restriction on 
the gossiping pattern, which was first adopted in [18] and is not difficult to satisfy in practice [20J: 

Assumption 2. The sequence (it(&))fcLi is such that the graph (V,foo) is connected. 

The following theorem says that, under Assumption [2] on the gossiping pattern, PE ensures 
asymptotic convergence of all the Xj(/c)'s to x* , circumventing limitation L4: 

Theorem 1. Consider the use of PE described in Algorithm^ Suppose Assumptions^ and\^hold. 
Then, (US} and © hold. 

Proof. See Appendix lA.il □ 

Finally, we point out that the above results may be viewed as a natural generalization of some 
known results in distributed averaging. Consider a special case where each node i £ V observes 
not an arbitrary function fi, but a quadratic one of the form fi(x) = |(x — m) 2 + Cj with domain 
X = R and parameters yi,Ci € R. In this case, finding the unknown optimizer x* amounts 
to calculating the network-wide average ^^iev^* °^ the n °de "observations" y^'s, so that the 
convex optimization problem ([5]) becomes an averaging problem. In addition, initializing the node 
estimates Xj(0)'s simply means setting them to the y^'s, and equalizing x Ul ^(k) and x U2 ^(k) 
simply means averaging them, so that PE reduces to Pairwise Averaging [18] and Randomized 
Gossip Algorithm [19]. Moreover, the invariant manifold A4 becomes the invariant hyperplane 
M. = {(x\,X2, ■ ■ ■ ,xn) € : J2ieV x i = SieV^i ^ n distributed averaging. Furthermore, both 
the common Lyapunov function V in (|15p and its drop in (|17p take a quadratic form: V(x(k)) = 
|(x(&) - x*) T (x(£;) - x*) and V(x(k)) - V(x(k - 1)) = -\x T {k - l)Q u(k) x(k - 1) \fk G P, where 
Q{ij} G 1& NxN is a symmetric positive semidefinite matrix whose ii and jj entries are ^, ij and ji 
entries are — i, and all other entries are zero. Therefore, the first-order-convexity-condition-based 
Lyapunov function (TL5j) generalizes the quadratic Lyapunov function in distributed averaging. 

4 Pairwise Bisectioning 

Although PE solves problem ([2]) and bypasses L1-L4, it requires one-time, one-way sharing of 
the /j's between gossiping nodes, which may be costly for certain /j's, or impermissible for security 
and privacy reasons. In this section, we develop another gossip algorithm that eliminates this 
requirement at the expense of more real-number transmissions per iteration. 
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Note that PE can be traced back to four defining equations ([7|)- (|10p . and that its drawback 
of having to share the fi's stems from having to solve ([9j) and (|10j) . To overcome this drawback, 
consider a gossip algorithm satisfying ([TJ) — dHj) and a new condition but not (JTOj) . Assuming, without 
loss of generality, that x Ul ( k \(k — 1) < x U2 f k \(k — 1) VA; G P, this new condition can be stated as 

x ui (k){k ~ 1) < x Ul{k) (k) < x U2(k) {k) < x U2 ( k) {k - 1), VA; G P. (18) 

Termed as the approaching condition, (fT8j) says that at each iteration A; G P, nodes u\(k) and U2(k) 
force x ui ( k )(k) and x U2 ^(k) to approach each other while preserving their order. Observe that the 
approaching condition (|18p includes the equalizing condition (|10p as a special case. Furthermore, 
unlike Q and (fTTjp , Q and (fTSj) do not uniquely determine x ui ^(k) and x U2 ^(k). Rather, they 
allow x ui r k \(k) and x U2 ^(k) to increase gradually from x ui ^(k — 1) and decrease accordingly from 
x u 2 (k)(k — 1), respectively, until the two become equal. 

The following lemma characterizes the impact of the non-uniqueness on the value of V: 

Lemma 3. Consider ([T])-© and (|18p . Suppose Assumption [7] holds. Then, for any given 
(u(k)) k x L 1 , (^(x(/c)))£L is non-increasing. Moreover, for any given k G P and x(/c — 1) € X , 
V(x(k)) strictly increases with x U2 ^(k) — x Ul ^(k) over [0, x U2 ^(k — 1) — x Ul ^(k — 1)]. 

Proof. Let («(&))j£Li be g iven - The n, from CLU, ©, and ©> we have ^( x 0)) - V(x(fc - 1)) = 
- E ieM(fc) - fi(xi(k - 1)) - f-(xi(k - lMUk) —X{(k — 1)) + (f-(xi(k - 1)) - f-(xi(k)))xi(k) 

Vk G P. Due to © and ©, £ ieii(fc) (#(xi(fc " X )) " fi( x i(k))) x i(k) = (4 (fc) (£ Ua (*# - 1)) - 
4(fc)( f «iW( /c )))( £ «iW( /c )-^ 2 w( /c )) ^ °- This > along with (HI, implies V(x(k))-V(x(k-1)) < 
VA; G P. Now let A; G P and x(A; — 1) G X N be given. By Lemma [TJ there exists a unique x eq G A' 
such that Ylieu(k) fi( x eq) = E ie «(fc) Also > x eq G [x Ul(fc) (A;),x„ 2(fc) (A;)]. Let x cq G X N be 

such that its ith entry is x eq if i G u(k) and Xi(k — 1) otherwise. Then, it follows from (|15p . (|5J), 
and (HU) that V r (x(fe)) - V(x cq ) = J2ieu(k) fi( x eq) ~ fi{xi{k)) - f'l(xi(k))(x eq - Xi{k)) > 0. Because 
fi(y) — fi( x ) — fl( x )(y — x ) strictly increases with \y — x\ for each fixed y G X Vi G V and because 
of ([9]) and (JTHJ) , the second claim is true. □ 

Lemma [3] says that the value of V can never increase. In addition, the closer x Ul t k \(k) and 
x u 2 (k)(k) get, the larger the value of V drops, and the drop is maximized when x Ul r k \(k) and 
x u 2 (k) (k) are equalized. These observations suggest that perhaps it is possible to design an algorithm 
that only forces x ui t k \(k) and x U2 r k \(k) to approach each other (as opposed to becoming equal) to 
the detriment of a smaller drop in the value of V, but at the benefit of not having to share the fi's. 
The following algorithm, referred to as Pairwise Bisectioning (PB), shows that this is indeed the 
case and utilizes a bisection step that allows x Ul r k \(k) and x U2 ^(k) to get arbitrarily close: 

Algorithm 2 (Pairwise Bisectioning). 

Initialization: 

1. Each node i G V computes x* G X, creates variables Xi,ai,b{ G X, and sets X{ <— x*. 
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Operation: At each iteration: 

2. A node with one or more one-hop neighbors, say, node i, initiates the iteration and selects a 
one-hop neighbor, say, node j, to gossip. Node i transmits Xj to node j. Node j sets a,j <— 
min{xj,fj} and bj <— maxjxj, Xj} and transmits Xj to node i. Node i sets dj <(— min{xj,Xj} 
and bi ^— max{xj, Xj}. Nodes i and j select the number of bisection rounds R € P. 

3. Repeat the following i? times: Node j transmits /j( aj ^ 6j ) — /j(xj) to node i Node i tests if 
f>(2i+k) _ + fl(^±^) - f'(xi) > 0. If so, node i sets b< <- S±k and transmits LEFT 
to node j, and node j sets 6j <— ° J ^ 6j . Otherwise, node % sets a« <(— ai + 6i and transmits 
RIGHT to node j, and node j sets aj ^— J . End repeat. 

4. Node j transmits f'j(cj) — to node i, where Cj = { ^ f ^ • Node i tests if {j'j{cj) — 

/jfo) + /?(ci) - /^(xi))^ - ^) > 0, where = { £ *|*gg . If so, node i sets x; <- 
(//) (// (xj) — fj(cj) + f'j(xj)) and node j sets Xj Cj. Otherwise, node i transmits //(q) — 
/•(xj) to node j and sets x, <- Cj, and node j sets % <- ~ /i( c i) + ■ 

Notice that Step 1 of PB is identical to that of PE except that each node i € V creates 
two additional variables, a, and 6j, which are used in Step 2 to represent the initial bracket 
[aj,6j] = [aj,&j] = [min{xj, Xj}, maxjxj, Xj}] for bisection purposes. Step 3 describes execution 
of the bisection method, where R € P denotes the number of bisection rounds, which may be 
different for each iteration (e.g., a large R may be advisable when Xj and Xj are very different). 
Observe that upon completing Step 3, x eq 6 = [dj,bj] C [min{xj, Xj}, max{xj, Xj}] and 

I , where x eq denotes the equalized value of Xj and Xj if PE were used. 
Moreover, upon completing Step 4, x eq G [min{xj, Xj}, max{xj, Xj}] C [cti,&i] = [aj,6j], where x, 
and Xj here represent new values. Therefore, upon completing each iteration k £ P, 

|£m(fc)(*0 - £« 2 (fc)( fc )l ^ 9«I^Mi(fc)( fc ~ !) - x U2 (k){k - 1)|, VTceP. (19) 

Finally, note that unlike PE which requires two real-number transmissions per iteration, PB requires 
as many as 3 + R or 4 + R. However, it allows the nodes to never share their /j's. 

The following theorem establishes the asymptotic convergence of PB under Assumption [2j 

Theorem 2. Consider the use of PB described in Algorithm^ Suppose Assumptions^ and\^hold. 
Then, JTB]) and © hold. 

Proof. See Appendix IA.21 □ 

As it follows from the above, PB represents an alternative to PE, which is useful when nodes 
are either unable, or unwilling, to share their /j's. Although not pursued here, it is straightforward 
to see that PE and PB may be combined, so that equalizing is used when one of the gossiping 
nodes can send the other its and approaching is used when none of them can. 
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5 Conclusion 

In this paper, based on the ideas of conservation and dissipation, we have developed PE and PB, 
two non-gradient-based gossip algorithms that enable nodes to cooperatively solve a class of convex 
optimization problems over networks. Using Lyapunov stability theory and the convexity structure, 
we have shown that PE and PB are asymptotically convergent, provided that the gossiping pattern 
is sufficiently rich. We have also discussed several salient features of PE and PB, including their 
comparison with the subgradient algorithms and their connection with distributed averaging. 

A Appendix 

A.l Proof of Theorem Q] 

Suppose Assumption Q] holds and let (u(k))'^ =1 satisfying Assumption [2] be given. Consider the 
following lemmas: 

Lemma 4. Suppose Assumption^ holds. Then, V[a, b] C X, there exists a continuous and strictly 
increasing function 7 : [0, 00) — >■ [0, 00) satisfying 7(0) = and lim^oo l(d) = 00, such that Vr/ > 0, 
Vi G V, V(x,y) G [a,b] 2 , fi{y) - fi(x) - f' i {x)(y - x) < rj implies \y - x\ < 7~ 1 ( r ?)- 

Proof. Let [a, b] C X. For each i G V, define gi : [a, b] 2 — > K as gi(x, y) = fi{y) — fi(x) — f' i {x){y — x) . 
Due to Assumption Q] and (fT4]) . gi(x,y) > V(x,y) G [a,&] 2 , where the equality holds if and 
only if x = y. Moreover, since f[ is strictly increasing and gi(x,y) can be written as gi(x,y) = 
f~(fi(t) — fl(x))dt, gi(x, y) is strictly increasing with \y — x\ for each fixed x G [a, b]. Furthermore, 
because fi and f[ are continuous, gi is continuous. Next, for each d G [0, b — a], let K.(d) = {(x, y) G 
[a, b] 2 : \y — x\ = d}. Also, for each « G V, define 7^ : [0, b — a] — > E as ji(d) = mi^( x ,y)&K(d) 9i( x i u)- 
Due to the compactness of lC(d) Vd G [0, b — a] and the continuity of gi, 7, is well-defined and 
continuous. In addition, since gi(x,y) = \/(x,y) G /C(0), 7i(0) = 0. Now pick any d\ and c?2 
such that < di < d2 < b — a. Let (£2, 2/2) G K{d 2 ) be such that 7^2) = gi{x2,y 2 )- If 2/2 > x 2 , 
then 2/2 — X2 = d 2 - In this case, 3yi G [2:2,2/2) such that yi — x 2 = d\. Since gi(x2,y) is strictly 
increasing with y for y > x 2 , we have 7,(^1) < gi(x 2 ,yi) < gi (£2,2/2) = 71(^2)- Similarly, if y 2 < x 2 , 
we also have 7i(di) < 7i(c?2). Hence, 7 j is strictly increasing. Finally, define 7 : [0,oo) — > [0, 00) 
as 7(d) = { . min A eV 7l ( ? (h v •f f A e ft°' b_a] v Note that 7(0) = since 7i (0) = Vi G V, and 

' v ' I miriigv li(p— a)+d— (b— a) if a e (o — a, oo) i\ / "V y ' 

that lim^oo 7(d) = 00. Moreover, since 7, is continuous and strictly increasing Vi G V, so is 7 
on [0,6 — a]. Also, observe that 7 is continuous and strictly increasing on [b — a, 00). Thus, 7 is 
continuous and strictly increasing. Now let i] > 0, i G V, and (2;, 2/) G [a, b] 2 . Suppose gi(x,y) < 77. 
If V < 7(6 — a), then |y — sc| < 7~ 1 (f?) because 7 (|y — x|) < 7i(|y — x|) < gi(x,y) < 77. If 77 > 7(6 — 0), 
then |y — cc| < b — a < 7^(77). □ 
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Lemma 5. Suppose Assumption [7] holds. Then, M[a, b] C X, 3/3 G (0, oo) such that Mi G V, 



P < oo. Let i 6 V and (x,y) € [a, b] 2 . Since /j is continuously differentiable, by the Mean Value 
Theorem, 3c between x and y such that fi{y) — fi(x) = f' i {c){y — x). This, along with the triangle 
inequality and the fact that f- is strictly increasing, implies that fi(y) — fi(x) — f' i {x){y — x) = 



(f>(c)-f>(x))(y-x) < \f>( c )-f>( x )\.\y- x \ < (\fl(c)\+\fl(x)\)\y-x\ < 2\fi(b)\-\y-x\ < (3\y-x\. □ 



Let a = minj g v^i(0) and b = maxjgyXi(O). Then, it follows from Proposition [2] that Xi(k) G 
[a,b] C X Mk G N Mi G V and from Q and Lemma [1] that x* G [a,b]. By Lemma [J there 
exists a continuous and strictly increasing function 7 : [0, 00) -4 [0, 00) satisfying 7(0) = and 
lim rf ^ 00 7(d) = 00, such that Vrj > 0, Mi G V, V(x,y) G [a, 6] 2 , - /j(x) - f[{x){y - x) < ry 

implies |y — x| < 7~ 1 ( 7 ?)- Also, by Lemma 3/3 G (0, 00) such that Mi G V, M{x,y) G [a,6] 2 , 
fi(y) ~ fi( x ) ~ fl( x )(y — x) < (3\y — x\. From Lemma [21 (V(x(fc)))^_ is nonnegative and non- 
increasing. Thus, 3c > such that lim^oo V(x(/c)) = c. To show that c must be zero, assume, to 
the contrary, that c > 0. Let e > be given by e = i{ aq N -i )- Then, 3&i G N such that 



Due to (HDD, V(x(Jfc-l))-V(x(A;)) < e VA; > fex + 1. Hence, from jnj) and (TTTD, /»(;ri(ife)) 
i))-fl(xi(k-l))(xi(k)-Xi(k-l)) <eMk > fci+lVi G u(Jfe). Asaresult, |x;(A;)-x;(A;-l)j < 7~ 1 ( e ) 
VA; > A;i + 1 Mi G u(As). Because of this and (fTOj) . 



Now suppose maxj e y Xj(fci) — mhijgy Xj(A;i) > 2(N — l)7 _1 (e). Then, 3p, q G V such that x g (/ci) — 
Xp(fei) > 27~ 1 (e) and Ci UC2 = V, where Ci = {? G V : Xj(A;i) < x p {k\)} and C2 = {i G V : Xj(A;i) > 
Xg(fci)}. Next, we show by induction that VA; > Aii, Xi(k) < x p (ki) Mi G C\ and Xj(A;) > x g (A;i) 
Vi G C2. Clearly, the statement is true for A; = A;i . For A; > k\ + l, suppose Xj (As— 1) < x p {k\) Mi G Ci 
and Xi{k — 1) > x g (&i) Vi G C2. Then, due to (f2Tj) . Vi G Ci, Vj G C2, {i, j} 7^ w(A;), i.e., u{k) C Ci or 
u(k) C C2. It follows from f)13|) and Lemma[T]that Xj(A;) < x p (A;i) Vi G Ci and Xi(k) > x q {k\) Mi G C2, 
completing the induction. Due again to (f2T]h we have Vi G Ci, Vj G C2, {i,j} ^ u(k) Mk>k\ + 1, 
which violates Assumption [2J Consequently, maxjgy Xj(A;i) — mhiigy xi{k\) < 2(N — l)7 -1 (e). It 
follows from @ and Lemma[T]that \x* — Xi{k\)\ < max, e y Xj{k\) — min je y %(Asi) < 2(N — l)7 _1 (e) 
Vi G V. Hence, V(x(ki)) < /3^igv l x * ~ x i(h)\ < P ■ N ■ 2(N - l)7 _1 (e) < c, which contradicts 
PU|) . Therefore, c = 0, i.e., (HHD holds, implying that © is satisfied. 

A. 2 Proof of Theorem d 

The proof is similar to that of Theorem [TJ Let a, b, 7, and /3 be as defined in Appendix IA.11 
Then, due to (JSJ), ([15]) . ©, and Lemma[H we have Xi(k) G [a, 6] VA; G N Mi G V and x* G [a, b}. From 




c< F(x(A;)) < c + e, Mk>k\. 



(20) 



Xi(fc) - %(A;)| < 27 _1 (e), Mk > h, Mi, j G «(ife + 1). 



(21) 
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Lemma O lim^oo V(x(k)) = c for some c > 0. To show that c = 0, assume to the contrary that 
c > and let e be as defined in lA.ll Then, (|20|) holds for some ki G N. It follows from the proof of 
LemmaEthat/^AO)-/;^ < e 

VJfc > fei+1 Vi G Thus, < 7 -1 (e) VA; > fci+1 Vi G it(Jfe). This, along with JTSJ 

and the fact that i? G P, implies \xi(k) - Xj(k)\ < ^— ^ < 47~ 1 (e) VA; > fei Vi, j G + Then, 

using the same idea as in lA.ll it can be shown that maxj g y Xi(k\) — minj 6 y Xi{k\) < 4(A^ — l)7 _1 (e). 
This leads to V(x(ki)) < c, which contradicts ([20]) . Therefore, (JT6J) and (J3J) hold. 
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